home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Tools & Utilities
/
Collection of Tools and Utilities.iso
/
tex
/
sed15.zip
/
HISTORY.ZIP
/
SED.SMA
< prev
next >
Wrap
Text File
|
1991-09-24
|
43KB
|
1,084 lines
Date: Fri, 21 Jun 91 14:07:05 CDT
From: robin@utafll.uta.edu (Robin Cover)
Message-Id: <9106212107.AA17356@utafll.uta.edu>
To: kirsch@usasoc.soc.mil
Subject: sed11
Cc: robin@utafll.uta.edu
Thanks to Eric Raymond and others who contributed to the 18K sed exec.
I have a couple questions - which might be as easily answered as having
a BSD UNIX manual (but I don't).
Scripts which worked with the GNUish sed don't work with the current
sed, namely, in the treatment of <CR> and <LF>. With the GNUish
version, one may enter OD OA (<CR><LF>) directly into a script and
get results; your current 18K version seems to accept the notation
\d13\d10 as equivalent to <CR><LF>, but I do not see this in the
man page. In fact, it has never been clear to me why the GNUish
utils for DOS do not (always?) predictably improve on the handling
of the 8th bit. Much of my text processing requires that I am
able to address control chars (0-31), hi-bit chars -- in a word,
all chars that wordprocessors reserve for their private purposes.
Standard UNIX is very unreliable in (usually NOT) allowing one to
address the full 8-bit ascii char set except for minor instances
of generosity, so I look to the DOS UNIX-lookalikes to solve the
problem.
Questions:
1) are decimal 10 and 13 the ONLY chars that can be addressed with "sed11"
using the convention \d13 ?
2) otherwise, will "sed11" faithfully handle all 256 chars, if they are
in scripts and text files?
3) is it not reasonable to think of pushing the buffer size beyond
4K (e.g., for the purpose of using tags in the pattern space)? The
major drawback of these utils is that they choke on long lines --
I am thinking of SGML files, for instance. Do you know of any
attempts (for 386 machines) to build grep/sed/awk to handle
LONG lines - approaching 32K, or more?
Thanks,
Robin Cover
-----------------------------------------------------------------------------
Robin Cover BITNET: zrcc1001@smuvm1 ("one-zero-zero-one")
6634 Sarah Drive Internet: robin@utafll.uta.edu ("uta-ef-el-el")
Dallas, TX 75236 USA Internet: zrcc1001@vm.cis.smu.edu
Tel: (1 214) 296-1783 Internet: robin@ling.uta.edu
FAX: (1 214) 841-3642 Internet: robin@txsil.sil.org
=============================================================================
Date: Sat, 22 Jun 91 16:14:58 EDT
From: David Kirschbaum
To: robin@utafll.uta.edu (Robin Cover)
Subject: Re: sed11
>Scripts which worked with the GNUish sed don't work with the current
>sed, namely, in the treatment of <CR> and <LF>. With the GNUish
>version, one may enter OD OA (<CR><LF>) directly into a script and
>get results; your current 18K version seems to accept the notation
> \d13\d10 as equivalent to <CR><LF>, but I do not see this in the
>man page.
I believe these instructions (from the sed11.man) apply to all commands:
l (2)
List. Sends the pattern space to standard output. A "w" option may follow as in the s command below. Non-printable characters expand to:
\b -- backspace (ASCII 08)
\t -- tab (ASCII 09)
\n -- newline (ASCII 10)
\r -- return (ASCII 13)
\e -- escape (ASCII 27)
\xx -- the ASCII character corresponding to 2 hex digits xx.
Your CR or 0DH would be \0D and your LF would be \0A. Similar to, but
simpler than, C's \0x0d and \0x0a.
> In fact, it has never been clear to me why the GNUish
>utils for DOS do not (always?) predictably improve on the handling
>of the 8th bit. Much of my text processing requires that I am
>able to address control chars (0-31), hi-bit chars -- in a word,
>all chars that wordprocessors reserve for their private purposes.
>Standard UNIX is very unreliable in (usually NOT) allowing one to
>address the full 8-bit ascii char set except for minor instances
>of generosity, so I look to the DOS UNIX-lookalikes to solve the
>problem.
Well, we have ordinary text file reads here. In DOS there's very little
processing that occurs during text file reads .. just EOL and EOF mainly.
I scanned sed11's source and found nothing to indicate text characters are
being tampered for their 8th bit.
Unfortunately, this is *not* "raw" mode in its crudest form, so you will
NOT be able to get *all* the control characters. But sed was, after all,
from the beginning intended to deal with TEXT. And you wish to step
outside those boundaries.
>Questions:
>
>1) are decimal 10 and 13 the ONLY chars that can be addressed with "sed11"
> using the convention \d13 ?
Donno .. I'm not familiar enough with sed and its command files to
experiment. It *looks* like any character could be used. But where'd you
get that "\dxx" business? I didn't see anything like that in the source,
and the man says just to use "\xx".
>2) otherwise, will "sed11" faithfully handle all 256 chars, if they are
> in scripts and text files?
It looks like it, except for ^Z (ASCII 26) and CR/LFs.
>3) is it not reasonable to think of pushing the buffer size beyond
> 4K (e.g., for the purpose of using tags in the pattern space)? The
> major drawback of these utils is that they choke on long lines --
> I am thinking of SGML files, for instance. Do you know of any
> attempts (for 386 machines) to build grep/sed/awk to handle
> LONG lines - approaching 32K, or more?
I tried, bumping the buffers up to 24K (needed malloc() to do that too).
It seemed to run just fine, but then choked&died on loooooong lines (about
14Kb). Don't know why, and don't plan to spend the time finding out!
Again, you're wandering beyond text files .. and there's no warrant for
sed to do things like that.
Sorry I can't be more help, but I am *not* a bonafide sed writer or
developer. I only did a hack to port it to Turbo C v2.0, and have no
plans to enhance or modify sed in any other way.
David Kirschbaum
Toad Hall
kirsch@usasoc.soc.mil
Return-Path: <bsu-cs!mdlawler@iuvax.cs.indiana.edu>
Date: Sun, 23 Jun 91 16:08:09 -0500
From: mdlawler@bsu-cs.bsu.edu (Michael D. Lawler)
Message-Id: <9106232108.AA01077@bsu-cs.bsu.edu>
To: kirsch@usasoc.soc.mil
Subject: sed
Here are the messages from Borland C++ 2.0 on the sed that you just put on
simtel. Do I need to worry about any of them and if so and you make diffs
will you please send them to me? Also can sed be a com file or do you know?
Borland C++ Version 2.0 Copyright (c) 1991 Borland International
sedcomp.c:
Warning sedcomp.c 221: Function should return a value in function main
Warning sedcomp.c 761: Function should return a value in function gettext
Warning sedcomp.c 831: Constant out of range in comparison in function ycomp
sedexec.c:
Warning sedexec.c 256: Function should return a value in function selected
Warning sedexec.c 472: Possibly incorrect assignment in function dosub
Turbo Link Version 4.0 Copyright (c) 1991 Borland International
Available memory 155200
Return-Path: <bsu-cs!mdlawler@iuvax.cs.indiana.edu>
Date: Sun, 23 Jun 91 19:21:30 -0500
From: mdlawler@bsu-cs.bsu.edu (Michael D. Lawler)
Message-Id: <9106240021.AA03140@bsu-cs.bsu.edu>
To: kirsch@usasoc.soc.mil
Subject: another sed question
Assuming that I have both cat and uudecode how do I get this script to work
with the sed that you just uploaded to simtel?
#! /bin/sh
cat $* | sed '/^END/,/^BEGIN/d'| uudecode
Date: Mon, 24 Jun 91 13:42:09 EDT
From: David Kirschbaum
To: mdlawler@bsu-cs.bsu.edu (Michael D. Lawler)
Subject: Re: sed
>Here are the messages from Borland C++ 2.0 on the sed that you just put on
>simtel. Do I need to worry about any of them and if so and you make diffs
>will you please send them to me? Also can sed be a com file or do you know?
Well, the C++ complaints look *almost* like my TC 2.0 ones ...
>sedcomp.c:
>Warning sedcomp.c 221: Function should return a value in function main
Yeah, yeah, it wants a "return(0)" to keep from bitching. The function
exits with other values B